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Abstract 

We present a new multigrid scheme for solving the Poisson equation with 
Dirichlet boundary conditions on a Cartesian grid with irregular domain 
boundaries. This scheme was developed in the context of the Adaptive Mesh 
Refinement (AMR) schemes based on a graded-octree data structure. The 
Poisson equation is solved on a level-by-level basis, using a "one-way inter- 
face" scheme in which boundary conditions are interpolated from the previ- 
ous coarser level solution. Such a scheme is particularly well suited for self- 
gravitating astrophysical flows requiring an adaptive time stepping strategy. 
By constructing a multigrid hierarchy covering the active cells of each AMR 
level, we have designed a memory-efficient algorithm that can benefit fully 
from the multigrid acceleration. We present a simple method for capturing 
the boundary conditions across the multigrid hierarchy, based on a second- 
order accurate reconstruction of the boundaries of the multigrid levels. In 
case of very complex boundaries, small scale features become smaller than 
the discretization cell size of coarse multigrid levels and convergence prob- 
lems arise. We propose a simple solution to address these issues. Using 
our scheme, the convergence rate usually depends on the grid size for com- 
plex grids, but good linear convergence is maintained. The proposed method 
was successfully implemented on distributed memory architectures in the 
RAMSES code, for which we present and discuss convergence and accuracy 
properties as well as timing performances. 

Keywords: Poisson equation; Multigrid methods; Adaptive mesh 
refinement; Elliptic methods 
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1. Introduction 



Elliptical partial differential equations play a central role in many math- 
ematical and physical problems. The Poisson equation, in particular, arises 
naturally in the context of electromagnetism, fluid dynamics and gravity. It 
is therefore of great significance in astrophysics, of which those fields are es- 
sential theoretical aspects. In the classical Newtonian limit, one can relate 
the gravitational potential $ to the matter distribution (local density) p by: 



Applications in computational astrophysics therefore often require solving a 
discretization of Eq. Q on some discrete grid. The ubiquity of the Poisson 
equation in science has led to the development of many suitable numerical 
solving techniques, for a wide range of specific applications and problems. 

In some cases, one can solve for the discretized potential directly from Eq. 
([!]): the convolution theorem allows to solve the Poisson equation directly 
in Fourier space using the Green function of the Laplacian operator on the 
grid of interest. In computational applications, this is usually done using 
the Fast Fourier Transform (FFT), which is particularly suited to periodic 
boundary conditions problems on rectangular, regular grids. This direct 
method is very effective: on a grid of linear size N, the computing time in 3 
dimensions is of order 0(N 3 log N), and yields the solution to the discretized 
problem within numerical roundoff accuracy. The Green function method 
can be generalized to non-periodic boundary conditions and adaptive grids 
on rectangular domains [see e.g. [S]. 

Another class of Poisson solving schemes are relaxation methods, such 
as Gauss-Seidel or successive over-relaxation (SOR) [see e.g. [22]. Unlike the 
Green function method, those schemes are iterative, and rely on an initial 
estimate ("first guess") of the solution. The estimate is then successively im- 
proved by iterative damping of the residual r = A$ — p. Relaxation methods 
are generally simple to implement, and suitable for a wide range of elliptic 
problems. Methods such as Gauss-Seidel or SOR only require the forward 
computation of the differential operator, which is usually inexpensive, and 
makes it possible to use such relaxation methods on complex grid geometries. 
Krylov subspace methods [see e.g. [25], such as the Conjugate Gradient (CG) 
method, are another class of iterative methods which can be used to solve 
Eq. ([l]) in the form of the optimization problem: 



A$ = p 



(1) 



min || A$ — p 



(2) 
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Classical iterative methods have gained in interest in the last decade 
with the advent of massively parallel computing. They only require one-cell 
thick, boundary layer from the neighboring processor, while FFT involves 
a massive transpose of the whole volume of data. Moreover, one does not 
need the exact solution of the discretized Poisson equation that the FFT 
provides: the error in the solution is already dominated by truncation errors 
of a fraction of a percent for practical applications. Iterative methods can 
therefore be stopped quite early, when the residual norm has dropped well 
below the truncation errors. Still, classical iterative methods can be very 
time consuming, especially when the grid size is increased. Usually, the large 
scale modes are the slowest to converge, and the number of iterations required 
to reach a given level of residual usually increases with the number of grid 
points per dimension [22]- 

The convergence rate of iterative relaxation methods, such as Gauss- 
Seidel, can be dramatically improved by multigrid acceleration [21 EI]- Multi- 
grid methods use a hierarchy of discretizations of decreasing spatial resolu- 
tion. At the resolution of the initial problem (on the finest grid), the solution 
is iteratively improved by subtracting corrections obtained from the coarser 
grids. This ensures that the large-scale modes of the residual are efficiently 
damped, while a traditional smoothing relaxation scheme takes care of the 
small-scale modes of the error. In ideal cases (e.g. for smooth problems with 
simple boundary conditions on rectangular domains), multigrid methods can 
exhibit remarkable convergence properties: the residual norm decreases at a 
constant rate during the whole convergence process (down to machine pre- 
cision). The convergence rate can be very fast (see examples below), and 
more importantly it does not depend on the size of the grid. This last prop- 
erty is perhaps the most important one, since it allows one to consider very 
large computational problems: for a given level of convergence in the resid- 
ual norm, optimal multigrid methods are linear in the problem size N 3 . This 
complexity is lower than the one offered by the FFT, and although multigrid 
is still slower than optimized FFT for small configurations, it can be quite 
competitive for large parallel computations and reasonable stopping criteria. 
In contrast to traditional iterative schemes, the convergence properties of 
multigrid are unaffected by the choice and quality of the first guess. Another 
reason to use multigrid is that it couples nicely to adaptive Cartesian grids, 
while FFT-based methods in this case are still possible, but quite complicated 
[see e.g. HI [HJ 123]- Last but not least, while FFT requires a Poisson equation 
with constant coefficients, multigrid methods, because they are defined in 
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real space, can be adapted to solve Poisson equations with non-uniform coef- 
ficients, or even non-linear equations. This property is particularly useful for 
problems with complex boundaries [21], multiple incompressible fluids [24], 
or models with modified gravity [121 [23, ITB]). 

The objective of this paper is to implement a simple and efficient multi- 
grid Poisson solver for a Cartesian grid with arbitrary domain boundaries. 
Complex boundary conditions are usually found in fluid mechanics problems 
featuring immersed bodies or complex multiphase flows [271 [251 03] • Here, 
our motivation is different: we would like to design a Poisson solver for the 
class of Adaptive Mesh Refinement (AMR) codes for which the mesh is de- 
fined on a "graded octree" [T31 1221 [21] • The octree mesh structure ensures 
that the geometry of the grid closely adapts to the properties of the flow, 
without the traditional overhead associated to the large rectangular patches 
of block-structured AMR. The consequence is however that the mesh of a 
given AMR level can have a very complicated shape, with holes and irregular 
boundaries. If one solves the Poisson equation on the whole AMR hierarchy, 
this translates into a modification of the Laplacian operator at fine-coarse 
grid boundaries, but the corresponding multigrid solver remains similar to 
its Cartesian grid equivalent [see e.g. ITT)] . 

In most astrophysical applications, it is however almost always impracti- 
cal to solve the Poisson equation on the whole grid at once: self-gravity has 
the effect of dramatically increasing the dynamical range of the density field, 
leading to very different characteristic timescales within the same computa- 
tional domain. Advancing the whole system with one single time step can 
be very inefficient, since only a very small fraction of the volume actually 
needs high accuracy in the time coordinate. This is particularly true for cos- 
mological simulations, where most of the computational volume is covered 
by low density regions that evolve slowly, while a small number of highly 
resolved cells sample dense regions (such as galaxies), where the dynamical 
time scale is very short. Most AMR codes address this problem by using 
adaptive time stepping, where a given fine level is updated more frequently 
than its coarser level, ensuring that the actual timestep remains close to the 
natural CFL timestep (so that the whole AMR grid is not updated more 
often than needed). In RAMSES, we typically update a finer level twice as 
often as its coarser level. As a consequence, consecutive fine and coarse levels 
are only synchronized at every other fine timestep. 

Whenever all the AMR levels are synchronized, it is possible to solve the 
Poisson equation on the whole grid at once. Such a Poisson solver is called a 
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"two-way interface" scheme, as the information is propagated both from the 
coarse grids to the finer grids and back: the coarse grids "feel" the effect of 
the finer grids. Many multigrid solvers have been successfully implemented 
in this case [TQ1 ED EE! Popinet [2T] has developed an incompressible 
flow solver with complex boundary conditions, using an octree data struc- 
ture. The solver features a "half V-cycle" multigrid schedule, which makes it 
possible to perform multigrid iterations within the existing AMR hierarchy, 
with no additional storage. This pioneering work, however, requires the use 
of a single global time step for the whole AMR grid. Most of the time, fine 
and coarse levels are not synchronized, and it is not possible to solve on the 
whole grid at once. It is therefore necessary to resort to a "one-way interface" 
scheme [see El EE31 EE1 EES]- In the traditional "one-way interface" approach, 
the AMR levels are updated separately: the coarsest level is updated first, 
then the computed coarse potential is used to impose boundary conditions on 
the finer levels, which can then be solved separately and more frequently. The 
gravitational potential on finer AMR levels is solved by imposing Dirichlet 
boundary conditions at the finer level boundary, enforcing potential conti- 
nuity. Because we are dealing with an arbitrarily complex octree structure, 
we must be able to solve the Poisson equation on a given refinement level 
with arbitrarily shaped domains. The problem we would like to address here 
is therefore to solve the Poisson equation using the multigrid method, with 
the constraint that Dirichlet boundary conditions are imposed on the outer 
cell faces of an arbitrary domain (see Fig. [I]). Note that using adaptive time 
stepping and one-way interface poses an time synchronization problem: as 
the finer levels are stepped forward, the boundary conditions derived from 
the coarser levels fall out of synchronization with the time at finer levels. 
The problem of non-synchronized AMR levels has been approached in the 
context of incompressible flows [T7] and coupled collisional and collisionless 
systems for cosmological applications [19J. A proposed solution is to update 
the coarse levels first and obtain the fine boundary conditions by both time 
and space interpolation of the coarse values. However, this requires storing 
the time derivatives of the potential at every level. In addition, in order to 
obtain a fully multilevel solution, the coarse levels should take into account 
the solution on the finer levels. Whenever the coarse levels are updated 
first, it is therefore necessary to correct their values with an estimate of the 
difference between the single level and multilevel solutions, before the finer 
levels are updated. Such an approach has been used for example by Martin 
and Colella [T7] , Miniati and Colella [IH] in order to provide corrected coarse 
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boundary conditions for the time interpolation. RAMSES uses a different 
approach: finer levels are updated first, using as boundary conditions val- 
ues which are not extrapolated in time but fixed at the initial coarse time 
step. Once the fine levels have been stepped forward in time, RAMSES up- 
dates the matter density in the coarse cells by binning the fine children cells 
into the coarse level density with a CIC scheme around their center of mass. 
The information from the fine evolution thus flows back up to the coarse 
levels. This scheme, however, does not guarantee the time accuracy of the 
time interpolation and correction methods. Note that the solver presented 
in this paper applies equally well to any scheme used to derive the boundary 
conditions, as it handles each AMR level independently. 

As Cartesian grids are much more practical than unstructured meshes, 
the accurate description of boundaries immersed in Cartesian grids have been 
the focus of much effort, especially in computational fluid dynamics [see e.g. 
[TJ[32]- This approach has also led to the development of sophisticated Poisson 
solvers, some relying on multigrid acceleration [TUt [T5]. 

Our goal is to present a simpler but efficient solver, suitable in particular 
to the case of tree-based AMR codes with a one-way interface scheme. We will 
apply our new scheme to the AMR code RAMSES, a self-gravitating fluid 
dynamics code for astrophysical applications [23]. The octree structure is 
based on the "Fully Threaded Tree" approach proposed by Khokhlov [TT] for 
which cells are split into "octs" (groups of 8 cells) on a cell-by-cell basis [T3J 
EB]- The paper is organized as follows: we first describe our basic multigrid 
algorithm for the Poisson equation with Dirichlet boundary conditions. We 
then detail how we capture complex boundary conditions at coarse multigrid 
levels within our framework. We report a very fundamental issue of our 
approach (namely the "small islands" problems) and we propose a simple fix 
to overcome it, and we finally discuss the performance of our algorithm. 

Note that we only address the issue of the boundary reconstruction for 
coarsened multigrid levels: for one-way interface AMR applications, at the 
finest level (i.e., the AMR level to solve), the boundary is usually located 
along faces of Cartesian grid cells constituting the fine AMR level, and no 
cut-cell or level set reconstruction is needed in this case. However, we believe 
our method can be adapted to a cut-cell or level set boundary at the fine 
level, provided an appropriate fine operator is defined. 
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2. A Multigrid algorithm for complex boundary conditions 



Multigrid methods combine a relaxation solver (usually a smoother such 
as Gauss-Seidel or Jacobi sweeps) and a multi-resolution approach, to ensure 
efficient damping of both small and large scale components of the residual 
(see for example [HI] for an introduction). 

For one-way interface schemes (in the RAMSES code for example), the 
Poisson equation is solved over the whole AMR hierarchy on a level-by-level 
basis, the size of the cell involved in each individual Poisson solve being uni- 
form. Each AMR level constitutes the finest level of the multigrid hierarchy. 
The finest multigrid level is therefore defined by a Cartesian grid with com- 
plex irregular boundaries. We then build a hierarchy of coarser grids which 
cover this reference domain (see Fig. [2] for an illustration). This new hier- 
archy defines the multigrid structure for the current level. Although it is 
also based on an octree structure similar to the underlying AMR grid, it is a 
different structure that does not interfere with the coarse AMR levels (which 
are, in a way, "orthogonal" to the multigrid levels). 

One major advantage of using this secondary grid hierarchy for each Pois- 
son solve is that the computational cost of the multigrid solver at a given 
level only scales as the number of cells in that level, as we use only a subset of 
the AMR coarser cells. This is especially crucial for astrophysical problems, 
where fine grids can occupy a fraction of the volume much smaller than the 
coarser levels. 

Let us consider the Poisson problem, discretized on a fine grid domain 

n /: 

A f $f = p f on tt f , 
Dirichlet conditions on dQf. 

where the / subscripts denote the discretized Laplacian operator and fields 
on the fine grid. For a single multigrid iteration, we follow the traditional 
recipe: 

1. Perform N pre Gauss-Seidel smoothing passes on the current estimate 
$/ of the fine solution, 

2. Compute the fine residual 

r f ^— A$, - p, 
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Figure 1: Poisson problem at the fine level for a given AMR level. The solution potential to 
be determined is defined at the cell centers (dots). The boundary conditions are defined 
at the border cell faces (crosses). The boundary values are linearly interpolated from 
the solution at the coarser AMR level (not represented), which was previously computed 
because of the one-way interface scheme. 




Figure 2: Levels of the multigrid hierarchy for the problem of Fig. [T] 
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3. Compute the fine-to-coarse restrictions of the solution and residual: 



r c < — R(r f ), 

4. Define a coarsened domain Q c and boundaries dfl c from Qf. The pre- 
scription for deriving dQ c from the fine level is the key issue, and will 
be discussed in section [U 

5. Compute a coarse correction 5$ c , by performing N cyc i es multigrid iter- 
ations for the following coarse problem: 

A c (5$ c ) = -r c on tt c 
5$ c = on dfi, c , 

6. Prolong <5<3> c into a fine correction 

S$ f <_ F(<?$ c ), 

7. Correct the fine level solution 

$/• < — $/ + 5$/, 

8. Perform N post smoothing passes on 

This multigrid iteration is repeated on the fine grid until the norm of the fine 
residual is considered small enough. To fully specify the scheme, one needs 
to specify integers N pre , N post and N cydes , the restriction and prolongation 
operators R and P, and the discretization of the Laplacian operator. 

In our case, we have found N pre = N post = 2 to yield the best perfor- 
mance in terms of solver time for our applications in RAMSES. We will now 
discuss the prolongation, restriction and discretized Laplacian operators. We 



postpone the discussion of the parameter N cydes to section 5.3 

We use a finite difference approach to discretize the Poisson equation on 
the Cartesian grid. Inside the domain, away from the boundaries, we use the 
standard 7-point discretization of the Laplacian operator: 
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Figure 3: Common first-order and second-order (respectively left and right) restriction 
and prolongation schemes (respectively top and bottom). Ri simply averages the values 
of the subcells to obtain a coarse value. Pi assigns the same coarse value to the all the 
fine subcells (straight injection). Our multigrid scheme uses R± and P2. 

where the sum extends over all the 6 pairs of neighboring cells, and 

Ax is the size of the cells on which the Laplacian is evaluated. This Lapla- 
cian operator is second-order accurate and our multigrid implementation is 
currently restricted to this case. 

For the relaxation smoother, we use a Gauss-Seidel smoother with red- 
black ordering, with the Laplacian operator as defined above. With red-black 
ordering, the cells are updated in two passes, each pass running over a half 
of the domain following the colors of a checkerboard pattern. 

Prolongation and restriction operators define how the problem is down- 
sampled to a coarser level (restriction) and how the coarser level correction 
is interpolated back to the finer level (prolongation). 

A common rule of thumb for the choice of prolongation and restriction 
operators for the Poisson equation is the order condition rip + riR > q, where 
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np and np are the prolongation and restriction operator orders, respectively, 
and q is the order of the Laplacian operator [see EU E2] ■ Simple standard 
operators are shown in Figure [3j Straight injection is defined as PI, while its 
transpose, Rl, boils down to a simple averaging. These operators are first- 
order in space (np = 1 and np = 1). Second-order schemes can be easily 
designed, like trilinear interpolation (noted here P2) and its transpose (R2), 
whose weights are shown in Figure [3j 

We have settled for simple averaging for restriction (Rl), and linear inter- 
polation (P2) for prolongation (see Fig. [3]), as this choice turned out to yield 
the best convergence rates for the classes of grids which we have studied. Ad- 
ditionally, as will be discussed in Section |4| picking simple averaging as the 
restriction operator turns out to be particularly convenient in conjunction 
with our prescription for boundary representation. 

3. Second-order multigrid boundary reconstruction and the mod- 
ified Laplacian operator 

In our multigrid algorithm, the key ingredient is the mathematical rep- 
resentation of the domain boundary at coarse multigrid levels. Because our 
goal is to take advantage of the Cartesian grid structure of the AMR scheme, 
we restrict ourselves to Cartesian grid interface capturing techniques. Among 
the many different solutions proposed in the literature, the cut cell [10J and 
the level set [271 1201 El approaches stand out as simple and efficient techniques 
for interface reconstruction. The multigrid algorithm for complex boundary 
we propose here can be easily implemented with any of these techniques as 
the basic interface reconstruction scheme. 

In what follows, we however simplify the problem even further, since we 
work in the framework of a one-way AMR Poisson solver. At the finest level 
(i.e., on the actual grid where the Poisson equation is to be solved), the 
boundaries dQ are simply positioned at the faces of the outer cells them- 
selves. In the level set approach, the boundary could have been placed at the 
zero level of some interpolated distance function [see e.g. [7J. Since comput- 
ing distance function comes with a cost, we have considered here a simpler 
approach based on a coloring scheme: inner cells are initialized with a mask 
function with value rrii — 1 and outer cells with a mask = —1. The 
domain boundary dQ is defined at the position where the interpolated value 
of the cell-centered mask (m 8 6 [—1, 1]) crosses zero. At the finest multigrid 
level, these interpolated positions are at the cell faces, as desired. 
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m < 




Figure 4: The Laplacian stencil near boundaries: the center of the leftmost cell lies outside 
the domain, and therefore has a negative mask value m < 0. For this cell, a ghost value 
will be reconstructed as shown on Fig. [5] For all the three other cells, the actual cell 
center values are used unchanged. 

We now compute the mask value at coarser multigrid levels by simply 
averaging recursively the mask value at the finest level (as for the residual). 
Here again, if one wants to use a distance function to capture the boundary, 
one could compute the coarser representation of the distance function also 
by simple averaging. At each level I in the multigrid hierarchy, the domain 
Q e is defined as the set of cells for which m\ > 0. The use of the Rl operator 
maintains the correct location of the boundary as much as possible, without 
spreading the boundary information too much across neighboring coarse cells. 

In order to reconstruct the location of the multigrid boundary, and solve 
for the corresponding Poisson problem, we use a prescription similar to the 
one described in Gibou et al. [7] . The idea is to redefine the Laplacian oper- 
ator close to the domain boundary. Whenever one of the neighboring <E>j has 
rrij < (and therefore lies outside £1, see Fig. [4j, it is replaced by a ghost 
value $j linearly extrapolated from $j and the boundary value. This ghost 
value depends explicitly on $, and the boundary condition. A ID illustration 
is presented on Figure [5] Since this prescription uses a linear reconstruction 
for the multigrid boundary in each direction, the boundary position is recov- 
ered at second order spatial accuracy. As in [7J, the boundary is positioned 
in each direction independently, and the ID case trivially generalizes to 3D. 

The coarsening of the boundary from dfl f to dfl c is now fully specified by 
the restriction operator used for the mask and by the modified Laplacian 
closed to the boundary. These are the 2 key ingredients in our multigrid 
scheme: the boundary remains at the same location (up to second order in 
space) when we go from fine to coarse levels in the multigrid hierarchy. The 
boundary condition $ = is therefore imposed at the correct location, so 
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Figure 5: Reconstruction of ghost cell values across boundaries. Cells with negative mask 
(white dots at centers) are outside of the domain and do not carry a valid <& value. The 
computation of the Laplacian at cell i requires the use of a ghost value which is 

obtained from <l>i by linearly extrapolating the boundary condition. In accordance with 
the level set idea, the boundary is positioned at the point where m crosses zero, which is 
found by linearly interpolating the mask values. 

that the coarse solution of the Poisson equation with the coarse boundary 
corresponds to the solution of the Poisson equation at the finer level with 
the fine representation of the boundary. 

We speculate that the chosen multigrid boundary reconstruction scheme 
should satisfy an order condition similar to the prolongation and restriction 
operators [211 122] • Although a mathematical proof is beyond the scope of this 
paper, our numerical experiments suggest that this is indeed the case. We see 
in Figure [7] that, in case of a smooth boundary, our second-order multigrid 
boundary reconstruction scheme results in a perfect multigrid convergence^ 
while the first-order scheme doesn't. As we will now discuss, in case of very 
complex boundaries, second-order reconstruction of the multigrid boundary 
is not possible anymore, and we have to degrade our scheme to first-order to 
ensure convergence. 

4. First-order multigrid boundary reconstruction and the "small 
islands" problem 

We see in Figure [7] that a different type of boundary can lead to a catas- 
trophic divergence of our multigrid scheme (the blue line shows the residual 
norm evolution in case of second-order multigrid boundary reconstruction). 



1 Multigrid convergence means here that the damping rate of the residual norm does 
not depend on the grid size. 
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Figure 6: Coarsening (restriction) may cause loss of boundary conditions. The Dirichlet 
BCs around an isolated cell with m = — 1 among a sea of m = 1 cells are lost after one 
coarsening step, as the resulting coarse mask is positive everywhere. 



This rather complex boundary condition is typical of AMR grids in cosmo- 
logical simulations [28J. It features many small disconnected domains that 
cluster in a large central region with many "holes". Successive coarsening 
of the grid resolution leads in this case to a loss of boundary conditions, 
especially when "holes" or "small islands" are present. Indeed, in this case, 
the finer boundary small scale features cannot be represented anymore on 
coarser grids. Figure [6] illustrates such for which the fine grid still 

has a cell with a negative mask value, so that the $ = boundary condition 
still applies, but the coarse grid has cells with only positive values. Because 
of this topological change in the boundary representation, the coarse and 
fine solutions of the Poisson equation become significantly different. Spuri- 
ous eigenmodes associated to this loss of constraints at the boundary are not 
damped quickly enough by the smoothing operator at the fine level: this leads 
to slower convergence rates and, in some case, to catastrophic divergence (see 
Fig. [7}. 

One previously proposed solution to this problem is the subtraction of 
the divergent modes introduced by these small islands, for example by re- 
combining the multigrid iterants [3j. This however turns out to be too com- 
putationally and memory intensive for us to use in the Poisson solver of 
the RAMSES code: it requires storing a large number of previous solutions. 
This recombination method also tends to perform worse when the number 
of islands increases, which would likely render it useless in the case of the 
complex grid structure encountered in cosmological simulations. 

Another proposed solution is to stop coarsening the residual in the multi- 
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grid hierarchy whenever such a problem occurs during multigrid coarsening 
|lUj . This degrades the performance of the multigrid method, since large 
scale modes are no longer damped efficiently. In the case of very complex 
boundaries with small islands, like in Figure [7j the level at which the hierar- 
chy has to be truncated is so close to the finest level that the whole multigrid 
approach breaks down. 

In order to overcome these limitations, McCorquodale et al. [IB] proposed 
to keep track of the boundary representation at the fine level, and modify 
the Laplacian on coarse grids, whenever the operator stencil crossed this fine 
boundary. Stencil nodes which cannot be reached from the stencil center 
by a straight segment without crossing the boundary are excluded, and an 
asymmetric stencil is used, chosen in function of the local configuration of 
the interface. 

We propose here a simpler approach, based on the observation that these 
small islands correspond to local minima in the color function m; (or equiv- 
alently in the distance function if needed). The averaging scheme will tend 
to smooth these extrema, resulting in the disappearance of the negative val- 
ues (see Fig. [6]) and in the apparition of spurious boundary conditions. In 
analogy to traditional high-order numerical schemes for hyperbolic systems 
of conservation laws [3U1 IS], we solve the problem by switching globally to a 
first-order multigrid boundary reconstruction scheme. We impose the $ = 
constraint at the coarse level on the grid point nearest]^] to the interface: in 
practice, for each cell for which the mask value rrii < 1, we impose = 0. 
The simple averaging restriction for the mask ensures that cell i has rrii < 1 
if and only if it has a non-zero intersection with the boundary. 

We have tested our first-order multigrid boundary reconstruction scheme 
on 2 extreme types of boundary conditions : a simple disk with no hole 
or island, and a complex clustered grid, typical of AMR cosmological sim- 
ulations. We have compared its convergence properties to the second-order 
reconstruction scheme in Figure [7j In the disk case, the simple topology 
of the boundary allows us to take full advantage of the second-order re- 
construction of multigrid boundaries, resulting in fast convergence rates (in 
blue), totally independent of the problem size. This shows that our overall 
scheme features a true "textbook" multigrid convergence. Using the first- 
order multigrid boundary reconstruction, we degrade the convergence rate 



2 The so-called NGP scheme for Nearest Grid Point. 
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Figure 7: L 2 norm of the residual as a function of iteration number in our multigrid 
Poisson solver in 2D for both presented boundary capturing schemes. For each domain 
shape shown in the insets, the residual for both 2 nd order (in blue) and 1 st order (in red) 
boundary capturing schemes is presented for a 64 2 grid up to a 1024 2 grid. 

significantly, and it now depends slightly on the problem size. This is ex- 
pected since the position of the boundary at two consecutive multigrid levels 
can differ by half a cell width, and this first-order error in the boundary 
positioning limits the accuracy of the coarse correction. 

In the complex boundary case, with many holes and islands, our second- 
order reconstruction fails, as we can see from the divergent behavior of the 
residual norm in Figure [7j We interpret this catastrophic behavior as follows: 
topological changes in the boundary representation at coarse levels result in 
spurious long-range mode that are not damped by the smoothing operators 
at finer levels. If we use first-order multigrid boundary reconstruction, we 
observe that the convergence is restored: although first-order reconstruction 
of the boundary introduces small-scale errors close to the boundary, these 
errors are efficiently damped by the smoothing operator at finer levels. Note 
however that the convergence rate is slower than in the disc case, and that 
the convergence rate now depends on the grid size. A similar effect was 
observed in [HI EI] ■ 

Note that in our present implementation, we have to decide beforehand to 
which order we wish to reconstruct the multigrid boundaries. We have tried 
to implement an adaptive algorithm, for which the reconstruction scheme is 
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adapted locally to the topological changes of the boundary, but no satisfac- 
tory results were found. This then raises the question: how do we choose 
between first or second order reconstruction, when solving the Poisson equa- 
tion on an arbitrarily complex grid? It may be possible to decide from a 
topological analysis of the grid (such as its genus and the number of con- 
nected components), but this proves difficult to achieve in practice. 

We have opted for a pragmatic approach, and decide at run time which 
scheme to employ for each AMR level independently. When we start solving 
the Poisson equation on a given level, we first try using the second-order 
reconstruction. We monitor the convergence rate during the iterations, and 
if it becomes slower than a fixed threshold, i.e. if ||r n+ i|| 2 / ||r n || 2 > r\ with 
typically r\ = 0.5, we switch to the first-order scheme for that level only 
and for the next 10 solves. With our current implementation, if the AMR 
grid is really complex (with small islands), the solver only wastes a couple 
of iterations before deciding on which of first or second-order gives the best 
convergence rate. This works very well in practice, even in cosmological 
simulations featuring clusters and filaments, as only a few intermediate AMR 
levels exhibit very complex topologies. 



5. Accuracy and performance tests 

5.1. Accuracy tests 

We have tested the accuracy of the solver by comparing numerical solu- 
tions computed using RAMSES to exact analytical expressions. We chose a 
2D setup inspired from galaxy simulations, with a radial mass distribution 
centered in the computational box. With coordinates (x,y) G [0, l] 2 the ra- 
dial coordinate is given by r 2 = (x — |) 2 + (y — \) 2 . We take the analytic 
potential $ e (r) — subscripted with e for exact — to be 



$ e fr) := In 



- r - 








r - 





(4) 



which is smooth everywhere in the domain and features a core at the origin. 
The parameter tq controls the concentration of the profile. The corresponding 
density profile p e {r) is obtained from Q: 

Pe(r) = (5) 
(r 2 + rg) 
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which is positive and smooth everywhere. The analytical expression for the 
force intensity f e = |V$| is 

m = ^ w 

and the force vector everywhere points towards the center of the box. The 
boundary conditions at the border of the computational box are set using 
the exact solution Q. 

Using this setup, we have computed the truncation error for the potential 
and the force as a function of the finest grid resolution for both a Cartesian 
grid and an AMR grid with 3 additional AMR levels. For the resolution 
level £, the finest cell size Ax in the grid is given by Ax = (l/2) e . We use 
a quasi-Lagrangian mesh refinement strategy, closest to what is commonly 
employed in astrophysical applications: given a fixed threshold mass M, each 
cell containing a mass exceeding 2 Ndim M is refined (split) into 2 Ndim children 
cells. 

In our test, whenever the resolution is increased by 1, the base (coarsest) 
AMR level is incremented by 1, and the mass resolution M is divided by 
4 (in 2D). This procedure allows us to increase the resolution of the finest 
AMR cells by exactly one level, while keeping the depth of the AMR grid a 
constant. We initially pick the value of M such that it triggers the refinement 
of 3 AMR levels. The resulting grid for resolution level 7 is shown on Fig. |9j 

We have evaluated the truncation errors for both the potential and the 
gravitational force in order to test the one-way interface scheme. Once a solu- 
tion for the potential has been obtained at for a given AMR level, RAMSES 
computes the force for this level using a 5-point finite difference approxima- 
tion of the gradient: 

= 4 $ i+1 - _ 1 9 M , - 4 

3 2A.x 3 4Ax vv ; ; v j 



The truncation errors are presented on Fig. |8j For the potential, both 
Cartesian and AMR grids feature second-order convergence. For a given 
resolution level, the AMR truncation error is larger than the corresponding 
Cartesian error: this is expected, since AMR will only use the maximal 
resolution in particular areas of the grid, leading to bigger truncation errors 
because of the presence of coarser cells. 
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Figure 8: Truncation error on the potential (left) and force (right) for the setup described 
in the text, as a function of the maximal resolution level I. The finest cell size Ax is given 
by Ax = (l/2) £ . The solid lines are 0(Ax 2 ) (second order), the dashed line is O(Ax) 
(first order), and the dotted line is 0(Ax 16 ). 



Figure 9: AMR grid in the accuracy test case described in the text, at resolution level 
7. The base grid is at level 4, with 3 AMR levels reaching down to level 7 using a quasi- 
Lagrangian refinement strategy. 
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Figure 10: Computation of the gravitational force in RAMSES using the finite difference 
approximation ([7]). At a fine-coarse interface, the missing fine values (empty circles) are 
computed from the coarse level potential by linear interpolation. Prior to solving for <f> 
at the fine level, the boundary values for the fine Poisson problem (blue crosses, see also 
Fig. IT]) had been computed in the same manner. 



Similarly to the potential, in the Cartesian case, the truncation error of 
the force decreases at second order. In the AMR case however, the conver- 
gence is degraded to first order for the L ro norm. We attribute this effect to 
the one-way interface scheme. Whenever the finite difference approximation 
of the gradient crosses the level boundary, RAMSES fills in the missing values 
by linear interpolation from the coarser level, as illustrated on Fig. 10 Since 
the coarse Laplacian operator and linear interpolation are both accurate to 
second order, the interpolated values (empty circles on Fig. 10) have a trun- 
cation error of the form (Ax) 2 €i, where 6{ = 0(1). Note that accounts for 
both the coarse Laplacian and interpolation errors. The valid values at the 
fine level (solid dots on Fig. 10) are accurate to second order, with a trunca- 
tion error (Ax) 2 r]i, with rji = 0(1). Across the interface, the truncation error 
jumps abruptly from (Ax) 2 €i to (Ax) 2 r]i. Since there is no reason for e, and 
r\i to connect to the same value at the coarse-fine boundary, the difference 
formula ([7]) will produce a 0(1/ Ax) spike on the derivative of these terms. 
This translates to a 0(1/ Ax) (Ax) 2 = O(Ax) error on the gradient of $. 

This problem was previously noted in the case of AMR solvers [see e.g. 
[TBI for a discussion for an AMR Poisson solver], and is inherent to the one- 
way interface scheme. It could be avoided using higher order Laplacian and 
interpolation operators. Using third order operators would reduce the error 
at the interface to O (Ax) 2 . 

Fig. [TT] represents the local force error as a function of cell radius for 
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Figure 11: Relative error on the force in the 3-level setup described in the text, at resolution 
level 8. The values at cell centers are colored by level correspondingly to Fig. [9] On the 
left panel, the border potential at fine coarse boundaries is interpolated from the coarser 
level, as implemented in RAMSES with the one-way interface scheme. The interpolation 
results in a degradation of the force to order 1 within shallow regions around the AMR 
level transitions. On the right panel, the border potential is obtained from the analytic 
solution, illustrating a best-case solution on the given grid. 



the test problem at resolution level 8. The left panel shows the RAMSES 
prescription using interpolation for computing potential and force at level 



boundaries, according to Fig. [10J In the right panel, the missing potential 
values for both boundary conditions and force computation are no longer 
interpolated from the coarse level, but rather evaluated directly from the 
exact analytic solution, which suppresses any truncation error problem at the 
level interfaces. The comparison of the two panels shows that the impact of 
the one-way interface scheme on the global quality of the solution is minimal: 
The local force error never exceeds about 1%, and the first order degradation 
of the force is confined to thin shells of codimension 1. As a result, the L2 
convergence of the force is still close to order 2, scaling approximately as 
Ax 16 (see Fig.g). 

Note that although this test case is simple, it provides valuable insight on 
the properties of the one-way interface scheme coupled to the computation 
of the force. In the case of RAMSES, extensive tests of force in the code had 
been made for cosmological simulations in Teyssier [28J using the original 



Poisson solver. Errors on the force due to the one-way interface AMR were 
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typically found to be within 1%. Since the Laplacian operator is identical 
at the fine (AMR) level in our multigrid implementation to the one used 
for these tests, our solver would yield exactly the same solution when fully 
converged to numerical accuracy 

Finally, we have tested the convergence and accuracy of the solver in the 
presence of gradients at a level boundary. We used a concentrated mass 
distribution, given by Eq. ^ with ro = 0.01, centered in the computational 
box. Starting from a base resolution at level 10 (corresponding to a regular 
grid of 1024 2 cells), we introduced one level of refinement at level 11, delimited 
by a circular boundary of radius 0.25. By moving around the refined region 
within the box, we can probe the effect of the gradient on the solver as the 
central peak in the mass distribution approaches the boundary. For various 
positions of the refined level, we have performed multigrid iterations with 
our algorithm with a residual reduction factor set to 10~ 7 . The resulting 
truncation error for the fine level is plotted on Fig. 12 as a function of the 
boundary-mass center distance. 

At separations greater than about 3ro, the gradient induced by the mass 
distribution has no noticeable effect on the quality of the global solution, 
for both the potential and the force. As the mass distribution gets closer to 
the level boundary however, large derivatives at the interface will introduce 
errors in the Dirichlet boundary condition, which will impact the quality 
of the solution in the whole refined region. Note, nevertheless, that in our 
test the force seems much less sensitive to this effect, possibly because its 
truncation error is already dominated by the first order term arising at level 
boundaries. 

While this effect is inherently present in one-way interface methods, it 
could likely be reduced by using higher-order reconstructions of the Dirichlet 
boundary condition. As evidenced by the results on the force, however, it is 
not clear that much can be gained by such a sophistication, unless the order 
of the whole scheme is increased globally. Rather, these results highlight the 
importance of selecting suitable AMR refinement criteria. In computational 
astrophysics, AMR grids are often refined whenever the local density crosses 
a threshold. RAMSES combines such criteria with a level dilation strategy — 
which "grows" AMR level outwards, and helps reducing the granularity of 
the grid while limiting sharp variations of the mass distribution along the 
boundary. 

In terms of convergence rate, the boundary gradient had no noticeable 
effect, as the solver converged in 6 or 7 iterations for all of the data points 
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Figure 12: Effect of the gradient at a level boundary on the quality of the fine level solution, 
for the potential (circles) and the force (squares) . For boundary-center of mass separations 
greater than about 3ro, the location of the density peak has no sizable effect. As the peak 
gets closer to the boundary, stronger gradients introduce errors in the numerical potential. 
Note however that the errors on the force — which are dominated by the first-order terms 
arising at the level boundary — are comparatively insensitive to the boundary gradient. 



represented on Fig. 12, yielding a convergence factor of about 10. 



5.2. Performance in cosmological simulations 

The new Poisson solver has by now been used in RAMSES in a variety 
of simulations. We present timings for our new multigrid (MG) solver on 
Table [TJ together with corresponding timings for the conjugate gradient (CG) 
method, used here as a reference example of traditional iterative solvers. The 
timings represent the total wall clock time required to solve for the Poisson 
equation on the whole AMR grid, for a residual L 2 norm reduction factor of 
10 -3 . For the reference timings, we have set the initial guess of the solution 
to zero everywhere on the grid, for both solvers. 

The tests were run on the CEA/CCRT Platine computer, consisting of 
BULL NovaScale 3045 units totaling 932 nodes, networked by an InfiniBand 
interconnect. Each node hosts 4 Intel® Itanium® 2 dual-core 1.6 GHz pro- 
cessors sharing 24 Gb of RAM. 

Simulation A (a cosmological simulation at very early time) has a base 
256 3 grid, with no additional level of refinement, on 32 processors. Simula- 
tion B is a "zoom" simulation, with a base resolution of 128 3 and successive 
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Type of AMR grid 


PE 


Resolution 


^cells 


CG (s) 


MG (s) 


A. Cartesian 


32 


256 3 + levels 


19M 


52 


4.1 


B. Zoom- in 


64 


128 3 + 3 levels 


47M 


160 


15 


C. Cosmology 


1024 


1024 3 + 5 levels 


4.8G 


2750 


1070 



Tabic 1: Poisson solver timings on three reference simulations, for conjugate gradient (CG) 
and multigrid (MG). 



forced refinements up to a 1024 3 -equivalent zoom-in area. The cosmological 
computation follows the formation of a dense dark matter halo within the 
focus area. Simulation C is a clustered, 50/i _1 Mpc box cosmological simu- 
lation at z — 1, with 1024 3 particles. At this level of nonlinearity, the finer 
AMR levels are extremely clustered, while the intermediate levels exhibit a 
complex topology as they follow the intermediate-density structures (walls, 
filaments and clumps). 

All timings of Table [T] show a strong performance advantage of the multi- 
grid method over the conjugate gradient. Additionally, during the time evo- 
lution of cosmological simulations, we witnessed that the multigrid solver 
convergence times are much more predictable and consistent across different 
runs than the conjugate gradient. We attribute this effect to the fact that 
the multigrid performance only depends on the topology of the grid, which 
changes progressively during the simulations, whereas the conjugate gradient 
is very sensitive to the quality of the first guess. 

In the context of one-way interface solvers on AMR grids, we can signifi- 
cantly improve the performance of the conjugate gradient solver by comput- 
ing a first guess solution based on the coarser level solution. We choose to 
linearly interpolate the initial guess of $ at level i from the solution at the 
coarser level £—1, which has just been computed. This "multilevel" approach 
guarantees an initial guess of reasonable quality at a small extra cost — the 
cost of interpolating the solution from the coarser AMR level to the finer 
level. Note that for our CG implementation, iterations only take place at 
the finest level, and are therefore not multigrid-accelerated. We now discuss 
timings for our 3 test simulations using this improved CG solver. Since new 
multigrid timings have shown to be practically unchanged down to 2 digits, 
when using this new first guess, we only give new timings for the conjugate 
gradient solver. 

Simulation A features a CG time between 5.8 and 23 seconds. The con- 
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jugate gradient timings are particularly erratic on the first few timesteps, 
because the first cosmological structures form at very small scale in the mid- 
dle of a nearly uniform density field; therefore such small scale features are 
not accurately represented on the first guess obtained by interpolating the 
coarse solution. The number of iterations necessary to reach a given resid- 
ual reduction factor is therefore high at the start of the simulation, before 
decreasing significantly as larger structures grow. In any case, the multigrid 
method performs significantly better than the conjugate gradient on carte- 
sian grids, even though the conjugate gradient benefits dramatically from an 
optimal first guess, and has less overall overhead. 

On simulation B, the CG solver with the new first guess takes 140 sec- 
onds. The almost tenfold performance gain of the multigrid algorithm over 
the conjugate gradient can be explained by the fast evolution of the matter 
density at the coarse levels in the early stages of the simulation. Since coarse 
levels use a bigger time step than finer levels because of adaptive timestep- 
ping, the potential on coarse cells — which is interpolated as a first guess for 
the finer potentials — is updated less frequently. The finer first guesses thus 
tend to be out of synchronization with the real solution, resulting in addi- 
tional conjugate gradient iterations. The multigrid algorithm is much less 
sensitive to first guess quality, resulting in a significant advantage over the 
conjugate gradient. This situation shows the real strength of the new solver 
in the context of adaptive time stepping. 

Finally, in the case of simulation C, the CG solver runs for 850 seconds, 
about 20% less than multigrid. Decomposing the solver time by level shows 
that the MG solver spends most of its time dealing with very fine and very 
clustered grids, at the finest end of the AMR structure. This is easily under- 
stood, as this type of grid geometries represent a worst-case scenario for the 
multigrid solver in terms of small islands, forcing intermediate AMR levels 
into the slightly less efficient 1 st order capturing mode. Moreover, at this 
stage of the simulation, the timestep is usually extremely small, which is 
beneficial to the conjugate gradient as discussed in the case of simulation 
B. This result suggests using a hybrid scheme in practice, where one uses 
the new multigrid method for most levels of the AMR hierarchy except the 
finest ones, which can be handled by the CG solver. This method has been 
implemented in RAMSES. 

For most astrophysical applications, it is sufficient to improve the solution 
on each level until the residual norm reaches about 10 -3 times the norm of 
the initial solution, which is obtained by interpolation of the coarse solution. 
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Note that we use N pre = N post = 2. In the vast majority of practical 3D grid 
geometries encountered in cosmology and galaxy formation simulations, the 
multigrid algorithm performs 4 to 5 iterations, corresponding to convergence 
factors of about 4-6. We observe this behavior regardless of the resolution 
of the simulation, from simple 128 3 runs to full-scale simulations starting at 
1024 3 with deep AMR grids. In this sense, the multigrid performance is close 
to textbook multigrid convergence in practical situations. 

5.3. Effect ofN cycles 

The N cyc i es parameter controls how many multigrid iterations are per- 
formed when computing a coarse correction, at any level of the MG hierarchy. 
More iterations usually yield a better correction (and less multigrid iterations 
at the finest level before reaching tolerance), but significantly increase the 
cost of each iteration. One must therefore find an appropriate trade-off. 

Performing more than 2 or 3 cycles is usually not desirable, because the 
coarse problem is only itself an approximation of the fine correction problem. 
We have studied N cydes = 1 ("V-cycles"), N cydes = 2 ("W-cycles") and 
Ncydes = 3. We have measured the residual reduction rate and the total 
solver time to solve to a given accuracy (10~ 10 in our tests) for a simple disk 
domain as shown on Figure [7} 



Typical results are presented on Figure 13 The first conclusion is that 
V-cycles are very sensitive to the chosen boundary reconstruction scheme. 
First-order multigrid boundary reconstruction is clearly detrimental to the 
convergence rate, though it does ensure convergence of grids with small is- 
lands. Interestingly enough, schedules with N cydes > 2 (like W-cycles) ap- 
pear to be insensitive to the order of the boundary reconstruction scheme. 
This suggests that first-order multigrid boundary reconstruction used in con- 
junction with W-cycles would ensure a convergence of the solver as fast as 
the second-order scheme. Since W-cycles have more costly iterations, we 



see in Figure 13 that this additional cost translates however into a longer 
overall time. For most astrophysical applications we have explored with the 
RAMSES code, we have found that V-cycles perform generally better than 
W-cycles. This can however depend on the actual implementation of the 
solver, and the kind of grid geometries arising in other applications. 

5.4. Parallel computing 

The progress of distributed memory architectures over vector supercom- 
puters has led to a regain of interest in iterative methods, as direct global 
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Figure 13: Effect of the multigrid schedule on the solver, for a simple disk-shaped domain 
test case where second-order multigrid boundary reconstruction does converge. 



solvers such as FFT are particularly expensive in terms of inter-process com- 
munications. Iterative methods can often be adapted to distributed memory 
architectures, with little modification, and therefore remain simple to imple- 
ment, while requiring limited communications. 

A broad class of parallelization techniques for physical problems consists 
of splitting the computational domain into subregions, which are each man- 
aged and updated by a dedicated computing core. Such spatial domain de- 
composition techniques rely on the ability to update each CPU independently 
first, then address the couplings between different domains. 

In RAMSES, this last step is implemented using buffer regions (see Fig. 14). 
Each computing core manages its own cells, but also possesses a local copy 
of cells from other neighboring CPUs which are needed for local compu- 
tation. These buffer cells need to be updated after every iteration of the 
various iterative solvers used in the code. The update operation is done by 
communicating the updated values of the buffer cells from the CPUs which 
own them to the buffer regions in other processors. Therefore, any CPU only 
communicates with its direct neighbors, and the number of neighbors usually 
remains small. Moreover, the number of buffer cells scales only as a surface 
term, limiting the transfer to computation volume ratio. This is in contrast 
to the FFT, which requires a full transpose of the grid, and global all-to-all 
communications. In our multigrid scheme, we need to communicate both the 
solution and the residual for each the buffer cell between every Gauss-Seidel 
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CPU1 



CPU2 



Figure 14: Domain decomposition and buffer regions as used in RAMSES, in particular 
for the Poisson solver. The thick black line marks the boundary of the spatial domain 
decomposition between CPU 1 and CPU 2. CPU 1 owns all the red cells, while CPU 2 
owns all the blue cells. In order to perform an update, each CPU needs the values of the 
fields in the immediate exterior vicinity of its domain (light red and light blue for CPUs 
2 and 1 respectively). These buffer cells are updated after each iteration of the various 
solvers using inter-process communications. 



sweep, and also after each restriction or prolongation operation. 

We have performed weak and strong scaling timings of our multigrid 
Poisson solver in RAMSES. The strong scaling test case is a simple 256 3 
cosmological simulation without any refinement (Cartesian grid), starting at 
4 processes up to 512 processes. The weak scaling test scales from 256 3 with 
4 processes to 2048 3 with 2048 processes. The test results are presented on 



Figure 15 We see that our parallel efficiency degrades down to 50% when 
we reach 32 3 cells per processor. Beyond this limit, we spend more time 
communicating data than updating the solution during each Gauss-Seidel 
sweep. We could improve our scaling by a factor of 2 by hiding surface cells 
communications by inner cells computations. The weak scaling tests shows 
that if we keep the computational load above 64 3 cells per processor, the 
scaling is almost perfect. This rule of thumb applies also for more complex 
grid geometries, although load balancing can degrade significantly in case of 
very deep adaptive time step strategies. 
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Processes Processes 

Figure 15: Strong (left panel) and weak (right panel) scaling test of our multigrid Poisson 
solver for a 256 3 Cartesian grid. 

6. Conclusion 

We have presented a simple and efficient multigrid algorithm for solving 
the Poisson equation on irregular domains defined on a regular Cartesian 
grid. This kind of problem frequently arises in AMR codes using a one- 
way interface strategy, and the method is therefore of particular interest 
for astrophysical applications with multiple adaptive time steps. Using a 
second-order reconstruction scheme for the boundaries of the multigrid lev- 
els, we have shown that our multigrid scheme features optimal convergence 
properties. Since we use a multigrid hierarchy orthogonal to the actual AMR 
grid, our memory requirements are minimal. In case of particularly complex 
boundary conditions, we have observed that our scheme fails to converge, an 
issue previously identified as the "small island" problem. We have designed 
a simple fix to this problem, degrading our multigrid boundary reconstruc- 
tion accuracy to first-order. We have implemented this new technique in the 
RAMSES code, using a simple algorithm to determine "on the fly" which 
reconstruction technique should be used. We have shown the solver to be 
second-order accurate for the potential even in the presence of AMR levels, 
while the global force accuracy is close to second-order accuracy. Near level 
interfaces, the one-way interface scheme causes degradation of the force to 
first order, but only in very localized regions of co dimension one. We have 
measured significant performance gains over our standard conjugate gradi- 



29 



ent solver, especially in the case of cosmological "zoom-in" simulations, where 
large fully-refined domains are present in the AMR grid. This simple and 
efficient multigrid solver could also be used for incompressible flow solvers, 
in presence of multiphase fluids or complex embedded solid boundaries. 
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